23 research outputs found

    Structured variable selection in support vector machines

    Get PDF
    When applying the support vector machine (SVM) to high-dimensional classification problems, we often impose a sparse structure in the SVM to eliminate the influences of the irrelevant predictors. The lasso and other variable selection techniques have been successfully used in the SVM to perform automatic variable selection. In some problems, there is a natural hierarchical structure among the variables. Thus, in order to have an interpretable SVM classifier, it is important to respect the heredity principle when enforcing the sparsity in the SVM. Many variable selection methods, however, do not respect the heredity principle. In this paper we enforce both sparsity and the heredity principle in the SVM by using the so-called structured variable selection (SVS) framework originally proposed in Yuan, Joseph and Zou (2007). We minimize the empirical hinge loss under a set of linear inequality constraints and a lasso-type penalty. The solution always obeys the desired heredity principle and enjoys sparsity. The new SVM classifier can be efficiently fitted, because the optimization problem is a linear program. Another contribution of this work is to present a nonparametric extension of the SVS framework, and we propose nonparametric heredity SVMs. Simulated and real data are used to illustrate the merits of the proposed method.Comment: Published in at http://dx.doi.org/10.1214/07-EJS125 the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org

    GrassmannOptim: An R Package for Grassmann Manifold Optimization

    Get PDF
    The optimization of a real-valued objective function f(U), where U is a p X d,p > d, semi-orthogonal matrix such that UTU=Id, and f is invariant under right orthogonal transformation of U, is often referred to as a Grassmann manifold optimization. Manifold optimization appears in a wide variety of computational problems in the applied sciences. In this article, we present GrassmannOptim, an R package for Grassmann manifold optimization. The implementation uses gradient-based algorithms and embeds a stochastic gradient method for global search. We describe the algorithms, provide some illustrative examples on the relevance of manifold optimization and finally, show some practical usages of the package

    Compound Identification Using Penalized Linear Regression on Metabolomics

    Get PDF
    Compound identification is often achieved by matching the experimental mass spectra to the mass spectra stored in a reference library based on mass spectral similarity. Because the number of compounds in the reference library is much larger than the range of mass-to-charge ratio (m/z) values so that the data become high dimensional data suffering from singularity. For this reason, penalized linear regressions such as ridge regression and the lasso are used instead of the ordinary least squares regression. Furthermore, two-step approaches using the dot product and Pearson’s correlation along with the penalized linear regression are proposed in this study

    Bayesian lead time estimation for the Johns Hopkins Lung Project data

    No full text
    Problem statement: Lung cancer screening using X-rays has been controversial for many years. A major concern is whether lung cancer screening really brings any survival benefits, which depends on effective treatment after early detection. The problem was analyzed from a different point of view and estimates were presented of the projected lead time for participants in a lung cancer screening program using the Johns Hopkins Lung Project (JHLP) data. Method: The newly developed method of lead time estimation was applied where the lifetime T was treated as a random variable rather than a fixed value, resulting in the number of future screenings for a given individual is a random variable. Using the actuarial life table available from the United States Social Security Administration, the lifetime distribution was first obtained, then the lead time distribution was projected using the JHLP data. Results: The data analysis with the JHLP data shows that, for a male heavy smoker with initial screening ages at 50, 60, and 70, the probability of no-early-detection with semiannual screens will be 32.16%, 32.45%, and 33.17%, respectively; while the mean lead time is 1.36, 1.33 and 1.23 years. The probability of no-early-detection increases monotonically when the screening interval increases, and it increases slightly as the initial age increases for the same screening interval. The mean lead time and its standard error decrease when the screening interval increases for all age groups, and both decrease when initial age increases with the same screening interval. Conclusion: The overall mean lead time estimated with a random lifetime T is slightly less than that with a fixed value of T. This result is hoped to be of benefit to improve current screening programs

    Protein Expression and Functional Relevance of Efflux and Uptake Drug Transporters at the Blood-Brain Barrier of Human Brain and Glioblastoma.

    No full text
    The knowledge of transporter protein expression and function at the human blood-brain barrier (BBB) is critical to prediction of drug BBB penetration and design of strategies for improving drug delivery to the brain or brain tumor. This study determined absolute transporter protein abundances in isolated microvessels of human normal brain (N = 30), glioblastoma (N = 47), rat (N = 10) and mouse brain (N = 10), and cell membranes of MDCKII cell lines, using targeted proteomics. In glioblastoma microvessels, efflux transporters (ABCB1 and ABCG2), monocarboxylate transporter 1 (MCT1), glucose transporter 1 (GLUT1), sodium-potassium pump (Na/K ATPase), and Claudin-5 protein levels were significantly reduced, while large neutral amino acid transporter 1 (LAT1) was increased and GLU3 remained the same, as compared with human normal brain microvessels. ABCC4, OATP1A2, OATP2B1, and OAT3 were undetectable in microvessels of both human brain and glioblastoma. Species difference in BBB transporter abundances was noted. Cellular permeability experiments and modeling simulations suggested that not a single apical uptake transporter but a vectorial transport system consisting of an apical uptake transporter and basolateral efflux mechanism was required for efficient delivery of poor transmembrane permeability drugs from the blood to brain
    corecore